Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit warning for missing labels in Multiindex.loc[[...]] (and more) #20770

Merged
merged 13 commits into from
May 1, 2018

Conversation

toobaz
Copy link
Member

@toobaz toobaz commented Apr 20, 2018

Main changes:

  • the presence of at least one key (and of all of them, for the temporary warning) is now checked after building the indexer, using the indexer itself, in a unique place
  • for this to be possible, obj.reindex was replaced with obj._reindex_with_indexers
  • _multi_take is uglier than before... but it's ugly anyway, and must be removed in a future refactoring (in which first indexers are built in all possible code paths, then values are extracted)
  • _has_valid_type was mostly used in "assertion mode" (without looking at its return value), so I changed it to _validate_key and now it is only used in "assertion mode"

Asv (--bench indexing) give

       before           after         ratio
     [3a2e9e6c]       [25711d3d]
+       162±0.2ms          273±5ms     1.68  indexing.NumericSeriesIndexing.time_getitem_array(<class 'pandas.core.indexes.numeric.Float64Index'>)
+         162±1ms        269±0.7ms     1.66  indexing.NumericSeriesIndexing.time_loc_array(<class 'pandas.core.indexes.numeric.Float64Index'>)
+       162±0.6ms          267±1ms     1.65  indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>)
+         164±2ms          270±1ms     1.64  indexing.NumericSeriesIndexing.time_ix_array(<class 'pandas.core.indexes.numeric.Float64Index'>)
+       166±0.6ms        273±0.6ms     1.64  indexing.NumericSeriesIndexing.time_getitem_lists(<class 'pandas.core.indexes.numeric.Float64Index'>)
+       163±0.5ms        267±0.8ms     1.64  indexing.NumericSeriesIndexing.time_getitem_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>)
+         163±3ms        267±0.4ms     1.64  indexing.NumericSeriesIndexing.time_loc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>)
+         412±3μs          664±4μs     1.61  indexing.NumericSeriesIndexing.time_ix_list_like(<class 'pandas.core.indexes.numeric.Int64Index'>)
+        1.08±0ms      1.42±0.05ms     1.32  indexing.NumericSeriesIndexing.time_ix_array(<class 'pandas.core.indexes.numeric.Int64Index'>)
+     1.14±0.01ms      1.31±0.02ms     1.15  indexing.PanelIndexing.time_subset
-       133±0.2ms        121±0.4ms     0.91  frame_methods.Iteration.time_iteritems_indexing
-         442±2ns          399±2ns     0.90  indexing.MethodLookup.time_lookup_iloc
-       145±0.8μs        130±0.8μs     0.90  indexing.IntervalIndexing.time_getitem_list
-         151±2μs          134±1μs     0.89  indexing.AssignTimeseriesIndex.time_frame_assign_timeseries_index
-      48.5±0.8μs       41.2±0.1μs     0.85  indexing.IntervalIndexing.time_getitem_scalar
-        44.3±1μs       35.6±0.1μs     0.80  indexing.NumericSeriesIndexing.time_iloc_list_like(<class 'pandas.core.indexes.numeric.Float64Index'>)
-     3.81±0.02ms      2.06±0.02ms     0.54  indexing.DataFrameNumericIndexing.time_loc_dups

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

notice that FloatIndex was not being checked at all (i.e. #17758 applied to it too), now it is.

In any case, as I wrote above, there are several improvements still to be made, but I want to go gradually because the complexity of this PR is already pretty high for me.

@toobaz toobaz changed the title Multiindex check missing Emit warning for missing labeles in Multiindex.loc[[...]] (and more) Apr 20, 2018
@toobaz
Copy link
Member Author

toobaz commented Apr 21, 2018

@jreback @jorisvandenbossche There is some problem with warnings in Python 2.7.

On my machine,

 pytest pandas/tests/indexing/test_indexing.py

passes and

pytest pandas/tests/indexing/test_loc.py

passes, but

pytest pandas/tests/indexing/test_loc.py  pandas/tests/indexing/test_indexing.py

fails with Did not see expected warning (as in circleci and travis-ci).

Any suggestion (apart from skipping those on Python 2.7)?

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Compat pandas objects compatability with Numpy or Python functions labels Apr 21, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good. need a few more comments. must be not-catching a warning (maybe only in py2).

with catch_warnings(record=True):
assert isna(df.ix[:, [-1]].values).all()
# ix does label-based indexing when having an integer index
with pytest.raises(KeyError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test on the row dim (maybe make these 2 a separate test), may already be an existing test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

with catch_warnings(record=True):
result = dfnu.ix[['E']]
tm.assert_frame_equal(result, expected)
with pytest.raises(KeyError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you a comment here on what you are testing (maybe separate test)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done)

@@ -59,12 +59,22 @@ def test_get_nan():

# ensure that fixing the above hasn't broken get
# with multiple elements
idx = [2, 30]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move into separate test

@@ -119,15 +119,15 @@ def test_loc_getitem_label_out_of_range(self):
typs=['ints', 'uints', 'labels', 'mixed', 'ts'],
fails=KeyError)
self.check_result('label range', 'loc', 'f', 'ix', 'f',
typs=['floats'], fails=TypeError)
typs=['floats'], fails=KeyError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this reflected in the whatsnew?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a change in the API: I just fixed the test, which was checking nothing at all (because test objects for floats were missing).

ax = o._get_axis(axis)
indexer, keyarr = ax._convert_listlike_indexer(key,
kind=self.name)
if indexer is not None and (indexer != -1).all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here

the list of keys was actually empty).
"""
ax = self.obj._get_axis(axis)
# True indicates missing values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a blank line would help here (not sure what this comment refers)

raise KeyError(
u"None of [{key}] are in the [{axis}]".format(
key=key, axis=self.obj._get_axis_name(axis)))
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the else here

@@ -1352,6 +1370,33 @@ def _has_valid_type(self, key, axis):

return True

def _convert_for_reindex(self, key, axis=None):
if axis is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

if com.is_bool_indexer(key):
key = check_bool_indexer(labels, key)
return labels[key]
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else not needed

elif is_integer(key):
return self._is_valid_integer(key, axis)
assert(self._is_valid_integer(key, axis))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this assert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, _validate_key should raise an error if the keys are not valid (and they aren't if they are an integer but not a valid integer). Better to test and eventually raise a ValueError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this should go thru _validate_key (the old _has_valid_type). it should raise if indicated by that type of indexer, so yes test for that

@codecov
Copy link

codecov bot commented Apr 21, 2018

Codecov Report

Merging #20770 into master will increase coverage by 0.01%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20770      +/-   ##
==========================================
+ Coverage   91.77%   91.79%   +0.01%     
==========================================
  Files         153      153              
  Lines       49313    49354      +41     
==========================================
+ Hits        45259    45306      +47     
+ Misses       4054     4048       -6
Flag Coverage Δ
#multiple 90.19% <93.75%> (+0.02%) ⬆️
#single 41.92% <56.25%> (+0.03%) ⬆️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 94.83% <100%> (+0.01%) ⬆️
pandas/core/indexes/base.py 96.64% <100%> (+0.01%) ⬆️
pandas/core/indexing.py 93.55% <93.47%> (+0.41%) ⬆️
pandas/core/indexes/timedeltas.py 91.15% <0%> (-0.07%) ⬇️
pandas/core/indexes/datetimes.py 95.73% <0%> (-0.04%) ⬇️
pandas/io/pytables.py 92.41% <0%> (ø) ⬆️
pandas/io/formats/latex.py 100% <0%> (ø) ⬆️
pandas/core/resample.py 96.07% <0%> (ø) ⬆️
pandas/core/series.py 94.03% <0%> (+0.03%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 563a6ad...c62973b. Read the comment docs.

@pep8speaks
Copy link

pep8speaks commented Apr 22, 2018

Hello @toobaz! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 01, 2018 at 11:03 Hours UTC

@toobaz
Copy link
Member Author

toobaz commented Apr 24, 2018

@jreback ready for me

@TomAugspurger
Copy link
Contributor

Are we doing this for 0.23?

@toobaz
Copy link
Member Author

toobaz commented Apr 26, 2018

Are we doing this for 0.23?

I guess so!

@jreback
Copy link
Contributor

jreback commented Apr 26, 2018

yes we should
need to review again - ok after rc

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Apr 26, 2018
@@ -4881,6 +4881,9 @@ def _ensure_index(index_like, copy=False):
if hasattr(index_like, 'name'):
return Index(index_like, name=index_like.name, copy=copy)

if is_iterator(index_like):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this covers generators as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -186,33 +186,21 @@ def __setitem__(self, key, value):
indexer = self._get_setitem_indexer(key)
self._setitem_with_indexer(indexer, value)

def _has_valid_type(self, k, axis):
def _validate_key(self, k, axis):
raise NotImplementedError()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this an AbstractClassError, and add a doc-string here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this an AbstractClassError,

Pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractMethoderror in pandas/core/base.py

@@ -1337,7 +1377,7 @@ def __init__(self, name, obj):
DeprecationWarning, stacklevel=2)
super(_IXIndexer, self).__init__(name, obj)

def _has_valid_type(self, key, axis):
def _validate_key(self, key, axis):
if isinstance(key, slice):
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string to these

@@ -1656,7 +1739,7 @@ class _LocIndexer(_LocationIndexer):
"index is integers), listlike of labels, boolean")
_exception = KeyError

def _has_valid_type(self, key, axis):
def _validate_key(self, key, axis):
ax = self.obj._get_axis(axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

elif isinstance(key, tuple):
# a tuple should already have been caught by this point
# so don't treat a tuple as a valid indexer
raise IndexingError('Too many indexers')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hit in tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure:

def test_getitem_setitem_fancy_exceptions(self):

(this is old code I just moved around)

l = len(self.obj._get_axis(axis))

if len(arr) and (arr.max() >= l or arr.min() < -l):
raise IndexError("positional indexers are out-of-bounds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hit in tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, moreover this is trivial refactoring.


if len(arr) and (arr.max() >= l or arr.min() < -l):
raise IndexError("positional indexers are out-of-bounds")
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do w/o the else here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I would have to add a

else:
    return

branch to the if just above, because when the indexer is valid it does not return/raise

return self._get_slice_axis(key, axis=axis)

if isinstance(key, list):
try:
key = np.asarray(key)
except TypeError: # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this was ever needed. I don't think there is any way in which np.asarray([...]) could raise a TypeError. np.asarray([0, [1, 2]]) raises ValueError (catched elsewhere).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right my point was that this was catching something, so let's see what it was (and if not, of course remove)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in https://github.com/pandas-dev/pandas/pull/15504/files#diff-7489e290a23303c4db4f803011ecaf8eR1728 ... but I think it was an unnecessary precaution ( @jorisvandenbossche ?), to the best of my understanding it never catched anything.

@@ -131,6 +132,8 @@ def test_setitem_dtype_upcast(self):
assert is_float_dtype(left['foo'])
assert is_float_dtype(left['baz'])

@pytest.mark.skipif(PY2, reason=("Catching warnings unreliable with "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? where does this come from? this means we are NOT catching a warning. need to pin this down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I wrote: #20770 (review)

A bug in testing code only, which arises with Python 2 only, and only when two test files are run together, is in the intersection of my "no idea where to start" and "not high priority anyway" categories.

I can open an issue once merged - as there's no simple way (as far as I know) to reproduce otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I said, this must be fixed before merging. This means you are not catching a warning. When you run the full suite does it not print out an uncaught warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, see #20770 (comment)

Copy link
Member Author

@toobaz toobaz Apr 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the full suite case behaves like the case with the two test files)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this would be the only remaining issue

I'm afraid I won't have time for (inherited) docstrings and learning what is a AbstractClassError soon.

I would make an exception in this case

To be honest I don't see to what is an exception been made. This PR fixes 3 bugs in code, and the fact that it identifies an additional, already existing, one in tests is an added bonus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz I am not questing the refactoring or that you fixed some bugs, just that something that was added is hiding a bug, maybe in an exception. This is much worse than known bugs and very very hard to find.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we all agree that the bug is in the test suite? If yes, I guess I don't have much to add: as you correctly point out, it is a nasty bug, which will be hard to isolate - even harder if we don't merge this (because none of us knows how to reproduce otherwise).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the point is that this refactor created the bug. i have no problem waiting to merge this until this is fixed. would rather have this correct, this is pretty tricky code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the point is that this refactor created the bug

This "but" has no logical meaning in this discussion. Either you agree with my reasoning that the bug is in the test suite, or you think that a bug in the code can lead a test to pass only when called in isolation, and then I ask you to explain me how. In the first case, this refactor didn't "create" any bug. In the second... I'm patiently waiting for an explanation (not for one more repetition of the same illogical claim).

Then, sure, there might be a bug in the code and a bug in the tests suite. That's why I have manually tried the code in the failing tests, and it emits the warning just as it should - feel free to try.

The point is not just 0.23.0, it is that we have no idea of what the problem is, it is certainly not caused by this branch, it will be most probably not fixed soon, this branch will soon become a rebasing nightmare (I know by experience), and the indexing code desperately needs cleanup (and it is already difficult enough).

And of course, the point is also that I still didn't see any logical argument for postponing. You know, I like when discussions are based on actual arguments (i.e. when they are not just a waste of time).

@toobaz
Copy link
Member Author

toobaz commented Apr 26, 2018

I think this deserves a more elaborate whatsnew notice?

Not sure... the notice in 0.21.0 (in principle) already covered this case (it didn't mention special cases with specific kinds of Indexes)

@jreback
Copy link
Contributor

jreback commented Apr 26, 2018

@toobaz the issue is that I have seen the warnings pop up and signal an actual error multiple times
if things are being forced / ignored it is only adding technical debt
i

@toobaz
Copy link
Member Author

toobaz commented Apr 26, 2018

if things are being forced / ignored it is only adding technical debt

In current master they are ignored.

@jreback
Copy link
Contributor

jreback commented Apr 26, 2018

how’s that?

@jorisvandenbossche
Copy link
Member

But, then I also needed to add an extra assert_warning, and didn't look into the actual test (the check_result stuff is rather difficult to understand ..) if this is correct or not.

To conclude myself: yes the added with tm.assert_produces_warning(..) is correct as it is testing a list with missing labels. The fact that the assert was not there was not logical (and a bug in the testing code), and should actually been have investigated when the asserts were added to the test in the first place.

Anyhow, the full test is also a bit bogus. As it only checks that at least one FutureWarning is raised, but the self.check_result tests several things at the same time .. (but that is not for this PR to fix :-))

@toobaz
Copy link
Member Author

toobaz commented Apr 27, 2018

@jorisvandenbossche thanks, I should have applied your fix.

@toobaz
Copy link
Member Author

toobaz commented Apr 27, 2018

@jorisvandenbossche let me know if you see any obvious mistake in my implementation of your fix...

@toobaz
Copy link
Member Author

toobaz commented Apr 27, 2018

@jorisvandenbossche for sure your fix is good, but I'm afraid there's something deeper (also because your fix alone should in principle affect Python 2 and 3, and invocation of the entire test suite or of single files, in the same way)

@jorisvandenbossche
Copy link
Member

Too bad :-) There might be other similar cases where warnings are catched and hidden in the testing code.

It's a bit strange that both are failing. Locally it was only one of two that are failing. (and with your updated branch none, at least when only running the two files)

That said, I am totally fine with skipping those for now, given it is probably a bug in existing testing code.

@toobaz
Copy link
Member Author

toobaz commented Apr 30, 2018

That said, I am totally fine with skipping those for now, given it is probably a bug in existing testing code.

Agree. So OK for 0.23.0?

@toobaz toobaz mentioned this pull request Apr 30, 2018
71 tasks
@jorisvandenbossche
Copy link
Member

@jreback what's your opinion here?

I was not yet able to look at the code changes in detail, but I think you said the changes in itself were fine, so that's ok for me.

I think the extra warning for MultiIndex/FloatIndex would be really nice to include in 0.23.0, so I would favor merging this.

I looked a bit in the failing cases for python 2, and the one we did find (unfortunately not all) was clearly an error in the testing code (a testing for Panel was catching and hiding all warnings, including one for the index, and hence it was not raised anymore for specific tests, not sure why it only was the case on python 2 though).
So personally, I wouldn't keep this out of 0.23.0 for that

@jreback jreback added this to the 0.23.0 milestone May 1, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor changes, lgtm.

@@ -873,6 +873,7 @@ Deprecations
- The ``is_copy`` attribute is deprecated and will be removed in a future version (:issue:`18801`).
- ``IntervalIndex.from_intervals`` is deprecated in favor of the :class:`IntervalIndex` constructor (:issue:`19263`)
- ``DataFrame.from_items`` is deprecated. Use :func:`DataFrame.from_dict` instead, or ``DataFrame.from_dict(OrderedDict())`` if you wish to preserve the key order (:issue:`17320`, :issue:`17312`)
- Indexing a ``MultiIndex`` or a ``FloatIndex`` with a list containing some missing keys will now show a ``FutureWarning``, consistently with other types of indexes (:issue:`17758`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add :class: for these

@@ -1107,6 +1108,8 @@ Indexing
- :func:`Index.to_series` now accepts ``index`` and ``name`` kwargs (:issue:`18699`)
- :func:`DatetimeIndex.to_series` now accepts ``index`` and ``name`` kwargs (:issue:`18699`)
- Bug in indexing non-scalar value from ``Series`` having non-unique ``Index`` will return value flattened (:issue:`17610`)
- Bug in indexing with iterator containing only missing keys, which raised no error (:issue:`20748`)
- Fixed inconsistency in ``.ix`` between list and scalar keys when index has integer dtype and does not include desired key(s) (:issue:`20753`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... scalar keys when the index has an integer dtype and does not include the desired ...

axis : int
Dimension on which the indexing is being made

Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we dont' put this if it returns None, @TomAugspurger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we don't.

o._get_axis_number(axis))
d[axis] = (keyarr, indexer)
return o._reindex_with_indexers(d, copy=True, allow_dups=True)
except(KeyError, IndexingError) as detail:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if its worth it, but is it possible to surround only the op that could raise the IndexingError? (with the try/except)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't address this. I agree that'd be nice, but at a glance it looks like there are 3-4 places that could raise one of those errors. Seems tricky

@@ -1073,8 +1085,9 @@ def _getitem_axis(self, key, axis=None):
if axis is None:
axis = self.axis or 0

if self._should_validate_iterable(axis):
self._has_valid_type(key, axis)
if is_iterator(key):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this is_iterator needed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't check this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be moved at a later stage... but I would see it as a loss (more general argument in #20748)

axis: int
Dimension on which the indexing is being made

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this I think

@@ -225,6 +221,10 @@ def test_dups_fancy_indexing(self):
result = df.loc[['A', 'A', 'E']]
tm.assert_frame_equal(result, expected)

if PY2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split this test, and use the skip decorator instead (and add the reason)

@TomAugspurger
Copy link
Contributor

Fixed up all but two of the comments. I don't think the tighter exception catching is easy to do correctly.

No idea about checking the key for an iterator. Running the tests now to see if it's hit.

@toobaz
Copy link
Member Author

toobaz commented May 1, 2018

I don't think the tighter exception catching is easy to do correctly.

... and in any case, I plan to soon merge that code in the "non-multi" branch.

(the "multi" branch should be peculiar in how it retrieves values, not in how it creates indexers)

@TomAugspurger
Copy link
Contributor

I plan to soon merge that code in the "non-multi" branch.

Is that going in this PR, or a different one?

@toobaz
Copy link
Member Author

toobaz commented May 1, 2018

Is that going in this PR, or a different one?

Future (it's the "future refactoring" I mention in the description)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 1, 2018 via email

@toobaz toobaz changed the title Emit warning for missing labeles in Multiindex.loc[[...]] (and more) Emit warning for missing labels in Multiindex.loc[[...]] (and more) May 1, 2018
@TomAugspurger TomAugspurger merged commit 901fc64 into pandas-dev:master May 1, 2018
@toobaz toobaz deleted the multiindex_check_missing branch May 1, 2018 14:16
@jorisvandenbossche
Copy link
Member

Some follow-up questions:

  • do we need to keep track of the problem with the warning being catched by other tests (and some of the discussion / things we found out from this PR) in an issue?

  • I asked above about the scope of the warning, and AFAIU, the added warning is only for the case where you pass a list of exact labels? (so fully specified for all levels) Eg .loc[[('lev0_0', 'lev1_3'), ('lev0_1', 'lev1_2'), (...]]
    Do we also need to look at the case where you index level per level? Eg specify the labels for only the first level: .loc[['lev0_0', 'lev0_1']]. For this case, unknown labels are still ignored.

@toobaz
Copy link
Member Author

toobaz commented May 1, 2018

do we need to keep track of the problem with the warning being catched by other tests (and some of the discussion / things we found out from this PR) in an issue?

Sure: #20914

I asked above about the scope of the warning

Indeed, the warning for the partial indexing (including indexing by levels, even where each level is specified) requires a separate approach: see #20916

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
5 participants